Overview

Dataset statistics

Number of variables7
Number of observations4000
Missing cells2223
Missing cells (%)7.9%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory218.9 KiB
Average record size in memory56.0 B

Variable types

NUM4
CAT2
BOOL1

Warnings

Height has 1894 (47.4%) missing values Missing
Weight has 326 (8.2%) missing values Missing
PATIENT_ID has unique values Unique

Reproduction

Analysis started2020-09-17 13:46:38.843291
Analysis finished2020-09-17 13:46:48.004032
Duration9.16 seconds
Software versionpandas-profiling v2.9.0
Download configurationconfig.yaml

Variables

PATIENT_ID
Real number (ℝ≥0)

UNIQUE

Distinct4000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean137605.122
Minimum132539
Maximum142673
Zeros0
Zeros (%)0.0%
Memory size31.2 KiB
2020-09-17T09:46:48.113886image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum132539
5-th percentile133038.95
Q1135075.75
median137592.5
Q3140100.25
95-th percentile142176.2
Maximum142673
Range10134
Interquartile range (IQR)5024.5

Descriptive statistics

Standard deviation2923.608886
Coefficient of variation (CV)0.02124636673
Kurtosis-1.191489871
Mean137605.122
Median Absolute Deviation (MAD)2513
Skewness0.00584744361
Sum550420488
Variance8547488.92
MonotocityStrictly increasing
2020-09-17T09:46:48.342863image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
1351621< 0.1%
 
1398891< 0.1%
 
1422051< 0.1%
 
1419601< 0.1%
 
1337641< 0.1%
 
1358111< 0.1%
 
1399051< 0.1%
 
1421061< 0.1%
 
1358031< 0.1%
 
1378501< 0.1%
 
Other values (3990)399099.8%
 
ValueCountFrequency (%) 
1325391< 0.1%
 
1325401< 0.1%
 
1325411< 0.1%
 
1325431< 0.1%
 
1325451< 0.1%
 
ValueCountFrequency (%) 
1426731< 0.1%
 
1426711< 0.1%
 
1426701< 0.1%
 
1426671< 0.1%
 
1426651< 0.1%
 

ihd
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size31.2 KiB
0
3446 
1
554 
ValueCountFrequency (%) 
0344686.2%
 
155413.9%
 
2020-09-17T09:46:48.498137image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Age
Real number (ℝ≥0)

Distinct76
Distinct (%)1.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean64.2475
Minimum15
Maximum90
Zeros0
Zeros (%)0.0%
Memory size31.2 KiB
2020-09-17T09:46:48.645111image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum15
5-th percentile29
Q152.75
median67
Q378
95-th percentile89
Maximum90
Range75
Interquartile range (IQR)25.25

Descriptive statistics

Standard deviation17.56094646
Coefficient of variation (CV)0.2733327594
Kurtosis-0.3325881438
Mean64.2475
Median Absolute Deviation (MAD)12
Skewness-0.6062680114
Sum256990
Variance308.3868405
MonotocityNot monotonic
2020-09-17T09:46:48.875707image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
901644.1%
 
771263.1%
 
781042.6%
 
831022.5%
 
791002.5%
 
74932.3%
 
72902.2%
 
80892.2%
 
73882.2%
 
81862.1%
 
Other values (66)295874.0%
 
ValueCountFrequency (%) 
151< 0.1%
 
161< 0.1%
 
1740.1%
 
18120.3%
 
19180.4%
 
ValueCountFrequency (%) 
901644.1%
 
89411.0%
 
88511.3%
 
87491.2%
 
86651.6%
 

Gender
Categorical

Distinct2
Distinct (%)0.1%
Missing3
Missing (%)0.1%
Memory size31.2 KiB
male
2246 
female
1751 
ValueCountFrequency (%) 
male224656.1%
 
female175143.8%
 
(Missing)30.1%
 
2020-09-17T09:46:49.117331image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-09-17T09:46:49.234365image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-17T09:46:49.580992image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length6
Median length4
Mean length4.87475
Min length3

Height
Real number (ℝ≥0)

MISSING

Distinct70
Distinct (%)3.3%
Missing1894
Missing (%)47.4%
Infinite0
Infinite (%)0.0%
Mean169.787227
Minimum1.8
Maximum431.8
Zeros0
Zeros (%)0.0%
Memory size31.2 KiB
2020-09-17T09:46:49.799093image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum1.8
5-th percentile152.4
Q1162.6
median170.2
Q3177.8
95-th percentile185.4
Maximum431.8
Range430
Interquartile range (IQR)15.2

Descriptive statistics

Standard deviation20.17460395
Coefficient of variation (CV)0.1188228603
Kurtosis77.10913097
Mean169.787227
Median Absolute Deviation (MAD)7.6
Skewness3.303175323
Sum357571.9
Variance407.0146444
MonotocityNot monotonic
2020-09-17T09:46:50.025724image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
177.82135.3%
 
182.91884.7%
 
170.21734.3%
 
172.71694.2%
 
167.61624.0%
 
162.61533.8%
 
157.51423.5%
 
175.31373.4%
 
165.11243.1%
 
180.31152.9%
 
Other values (60)53013.2%
 
(Missing)189447.3%
 
ValueCountFrequency (%) 
1.81< 0.1%
 
1330.1%
 
13.71< 0.1%
 
1420.1%
 
15.21< 0.1%
 
ValueCountFrequency (%) 
431.81< 0.1%
 
426.71< 0.1%
 
419.11< 0.1%
 
406.41< 0.1%
 
398.81< 0.1%
 

ICUType
Categorical

Distinct4
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size31.2 KiB
Medical ICU
1481 
Surgical ICU
1068 
Cardiac Surgery Recovery Unit
874 
Coronary Care Unit
577 
ValueCountFrequency (%) 
Medical ICU148137.0%
 
Surgical ICU106826.7%
 
Cardiac Surgery Recovery Unit87421.9%
 
Coronary Care Unit57714.4%
 
2020-09-17T09:46:50.246339image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-09-17T09:46:50.374260image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-17T09:46:50.589967image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length29
Median length12
Mean length16.20975
Min length11

Weight
Real number (ℝ≥0)

MISSING

Distinct836
Distinct (%)22.8%
Missing326
Missing (%)8.2%
Infinite0
Infinite (%)0.0%
Mean81.47826892
Minimum21.7
Maximum300
Zeros0
Zeros (%)0.0%
Memory size31.2 KiB
2020-09-17T09:46:50.802092image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum21.7
5-th percentile50.5
Q166
median78.6
Q392
95-th percentile122
Maximum300
Range278.3
Interquartile range (IQR)26

Descriptive statistics

Standard deviation23.62843246
Coefficient of variation (CV)0.2899967411
Kurtosis7.409920777
Mean81.47826892
Median Absolute Deviation (MAD)13
Skewness1.704291978
Sum299351.16
Variance558.3028205
MonotocityNot monotonic
2020-09-17T09:46:51.026674image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
70982.5%
 
80792.0%
 
90681.7%
 
60611.5%
 
65561.4%
 
75541.4%
 
100531.3%
 
85431.1%
 
77320.8%
 
82310.8%
 
Other values (826)309977.5%
 
(Missing)3268.2%
 
ValueCountFrequency (%) 
21.71< 0.1%
 
31.71< 0.1%
 
321< 0.1%
 
34.61< 0.1%
 
351< 0.1%
 
ValueCountFrequency (%) 
3001< 0.1%
 
2801< 0.1%
 
2531< 0.1%
 
2301< 0.1%
 
22030.1%
 

Interactions

2020-09-17T09:46:41.860747image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-17T09:46:42.152135image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-17T09:46:42.442128image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-17T09:46:42.982080image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-17T09:46:43.262357image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-17T09:46:43.581700image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-17T09:46:43.924566image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-17T09:46:44.238024image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-17T09:46:44.538687image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-17T09:46:44.823537image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-17T09:46:45.107232image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-17T09:46:45.372558image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-17T09:46:45.647765image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-17T09:46:45.981390image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-17T09:46:46.283219image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-17T09:46:46.573695image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Correlations

2020-09-17T09:46:51.224606image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2020-09-17T09:46:51.487938image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2020-09-17T09:46:51.761570image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2020-09-17T09:46:52.036326image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2020-09-17T09:46:52.284312image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2020-09-17T09:46:47.021948image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-17T09:46:47.417747image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-17T09:46:47.710896image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-17T09:46:47.860605image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Sample

First rows

PATIENT_IDihdAgeGenderHeightICUTypeWeight
0132539054femaleNaNSurgical ICUNaN
1132540076male175.3Cardiac Surgery Recovery Unit76.0
2132541044femaleNaNMedical ICU56.7
3132543068male180.3Medical ICU84.6
4132545088femaleNaNMedical ICUNaN
5132547064male180.3Coronary Care Unit114.0
6132548068female162.6Medical ICU87.0
7132551178female162.6Medical ICU48.4
8132554064femaleNaNMedical ICU60.7
9132555074male175.3Cardiac Surgery Recovery Unit66.1

Last rows

PATIENT_IDihdAgeGenderHeightICUTypeWeight
3990142655043maleNaNMedical ICU92.9
3991142659088maleNaNCoronary Care Unit90.7
3992142661089male177.8Surgical ICU64.0
3993142662086male162.6Medical ICU53.0
3994142664051femaleNaNSurgical ICU75.0
3995142665070femaleNaNSurgical ICU87.0
3996142667025maleNaNMedical ICU166.4
3997142670044maleNaNMedical ICU109.0
3998142671137maleNaNMedical ICU87.4
3999142673078female157.5Surgical ICU70.7